National Repository of Grey Literature 10 records found  Search took 0.00 seconds. 
STATISTICAL LANGUAGE MODELS BASED ON NEURAL NETWORKS
Mikolov, Tomáš ; Zweig, Geoffrey (referee) ; Hajič,, Jan (referee) ; Černocký, Jan (advisor)
Statistické jazykové modely jsou důležitou součástí mnoha úspěšných aplikací, mezi něž patří například automatické rozpoznávání řeči a strojový překlad (příkladem je známá aplikace Google Translate). Tradiční techniky pro odhad těchto modelů jsou založeny na tzv. N-gramech. Navzdory známým nedostatkům těchto technik a obrovskému úsilí výzkumných skupin napříč mnoha oblastmi (rozpoznávání řeči, automatický překlad, neuroscience, umělá inteligence, zpracování přirozeného jazyka, komprese dat, psychologie atd.), N-gramy v podstatě zůstaly nejúspěšnější technikou. Cílem této práce je prezentace několika architektur jazykových modelůzaložených na neuronových sítích. Ačkoliv jsou tyto modely výpočetně náročnější než N-gramové modely, s technikami vyvinutými v této práci je možné jejich efektivní použití v reálných aplikacích. Dosažené snížení počtu chyb při rozpoznávání řeči oproti nejlepším N-gramovým modelům dosahuje 20%. Model založený na rekurentní neurovové síti dosahuje nejlepších publikovaných výsledků na velmi známé datové sadě (Penn Treebank).
Processing of User Reviews
Cihlářová, Dita ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
Very often, people buy goods on the Internet that they can not see and try. They therefore rely on reviews of other customers. However, there may be too many reviews for a human to handle them quickly and comfortably. The aim of this work is to offer an application that can recognize in Czech reviews what features of a product are most commented and whether the commentary is positive or negative. The results can save a lot of time for e-shop customers and provide interesting feedback to the manufacturers of the products.
Language Modelling for German
Tlustý, Marek ; Bojar, Ondřej (advisor) ; Hana, Jiří (referee)
The thesis deals with language modelling for German. The main concerns are the specifics of German language that are troublesome for standard n-gram models. First the statistical methods of language modelling are described and language phenomena of German are explained. Following that suggests own variants of n-gram language models with an aim to improve these problems. The models themselves are trained using the standard n-gram methods as well as using the method of maximum entropy with n-gram features. Both possibilities are compared using corelation metrics of hand-evaluated fluency of sentences and automatic evaluation - the perplexity. Also, the computation requirements are compared. Next, the thesis presents a set of own features that represent the count of grammatical errors of chosen phenomena. Success rate is verified on ability to predict the hand-evaluated fluency. Models of maximum entropy and own models that classify only using the medians of phenomena values computed from training data are used.
Rychlý a trénovatelný tokenizér pro přirozené jazyky
Maršík, Jiří ; Bojar, Ondřej (advisor) ; Spousta, Miroslav (referee)
In this thesis, we present a data-driven system for disambiguating token and sentence boundaries. The implemented system is highly configurable and versatile to the point its tokenization abilities allow to segment unbroken Chinese text. The tokenizer relies on maximum entropy classifiers and requires a sample of tokenized and segmented text as training data. The program is accompanied by a tool for reporting the performance of the tokenization which helps to rapidly develop and tune the tokenization process. The system was built with multi-platform libraries only and with emphasis on speed and correctness. After a necessary survey of other tools for text tokenization and segmentation and a short introduction to maximum entropy modelling, a large part of the thesis focuses on the particular implementation we developed and its evaluation.
Processing of User Reviews
Cihlářová, Dita ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
Very often, people buy goods on the Internet that they can not see and try. They therefore rely on reviews of other customers. However, there may be too many reviews for a human to handle them quickly and comfortably. The aim of this work is to offer an application that can recognize in Czech reviews what features of a product are most commented and whether the commentary is positive or negative. The results can save a lot of time for e-shop customers and provide interesting feedback to the manufacturers of the products.
Probability density determination by means of Gibbs entropy probability density
Náprstek, Jiří ; Fischer, Cyril
A method of random response investigation of a nonlinear dynam-ical system is discussed. In particular, the solution of the probability density of a single/multi-degree of freedom (SDOF/MDOF) system response is investigated. Multiple stable equilibrium states with possible jumps of the snap-through type among them are considered. The system is Hamiltonian with weak damping excited by a set of non-stationary Gaussian white noises. The solution, which is based on the Gibbs principle of the maximum entropy of probability, can be employed in various branches of engineering. The search for the extreme of the Gibbs entropy functional is formulated as a constrained optimization problem. The secondary constraints follow from the Fokker-Planck equation (FPE) for the system considered or from the system of ordinary di_erential equations for the stochastic moments of the response derived from the relevant FPE
Language Modelling for German
Tlustý, Marek ; Bojar, Ondřej (advisor) ; Hana, Jiří (referee)
The thesis deals with language modelling for German. The main concerns are the specifics of German language that are troublesome for standard n-gram models. First the statistical methods of language modelling are described and language phenomena of German are explained. Following that suggests own variants of n-gram language models with an aim to improve these problems. The models themselves are trained using the standard n-gram methods as well as using the method of maximum entropy with n-gram features. Both possibilities are compared using corelation metrics of hand-evaluated fluency of sentences and automatic evaluation - the perplexity. Also, the computation requirements are compared. Next, the thesis presents a set of own features that represent the count of grammatical errors of chosen phenomena. Success rate is verified on ability to predict the hand-evaluated fluency. Models of maximum entropy and own models that classify only using the medians of phenomena values computed from training data are used.
Rychlý a trénovatelný tokenizér pro přirozené jazyky
Maršík, Jiří ; Bojar, Ondřej (advisor) ; Spousta, Miroslav (referee)
In this thesis, we present a data-driven system for disambiguating token and sentence boundaries. The implemented system is highly configurable and versatile to the point its tokenization abilities allow to segment unbroken Chinese text. The tokenizer relies on maximum entropy classifiers and requires a sample of tokenized and segmented text as training data. The program is accompanied by a tool for reporting the performance of the tokenization which helps to rapidly develop and tune the tokenization process. The system was built with multi-platform libraries only and with emphasis on speed and correctness. After a necessary survey of other tools for text tokenization and segmentation and a short introduction to maximum entropy modelling, a large part of the thesis focuses on the particular implementation we developed and its evaluation.
STATISTICAL LANGUAGE MODELS BASED ON NEURAL NETWORKS
Mikolov, Tomáš ; Zweig, Geoffrey (referee) ; Hajič,, Jan (referee) ; Černocký, Jan (advisor)
Statistické jazykové modely jsou důležitou součástí mnoha úspěšných aplikací, mezi něž patří například automatické rozpoznávání řeči a strojový překlad (příkladem je známá aplikace Google Translate). Tradiční techniky pro odhad těchto modelů jsou založeny na tzv. N-gramech. Navzdory známým nedostatkům těchto technik a obrovskému úsilí výzkumných skupin napříč mnoha oblastmi (rozpoznávání řeči, automatický překlad, neuroscience, umělá inteligence, zpracování přirozeného jazyka, komprese dat, psychologie atd.), N-gramy v podstatě zůstaly nejúspěšnější technikou. Cílem této práce je prezentace několika architektur jazykových modelůzaložených na neuronových sítích. Ačkoliv jsou tyto modely výpočetně náročnější než N-gramové modely, s technikami vyvinutými v této práci je možné jejich efektivní použití v reálných aplikacích. Dosažené snížení počtu chyb při rozpoznávání řeči oproti nejlepším N-gramovým modelům dosahuje 20%. Model založený na rekurentní neurovové síti dosahuje nejlepších publikovaných výsledků na velmi známé datové sadě (Penn Treebank).
Sentiment analysis of social networks
Zaplatílek, Jan ; Jelínek, Ivan (advisor) ; Bruckner, Tomáš (referee)
This thesis concerns about sentiment analysis. In more detail sentiment analysis of social networks. Main goal of sentiment analysis is determine if tested document expresses any sentiment and, if so, whether is positive or negative. Main reason for sentiment analysis on social networks is detecting sentiment and feels about some company or brand. This activity is called brand monitoring. Information acquired from brand monitoring can be used for improving marketing or communication with customers. This thesis deals with sentiment analysis of post from public Facebook profiles of several Czech banks and telecommunication operators. Goal of this thesis is create model which has precision of determine sentiment of Facebook posts at least 80%. Method for achieving this goal is experiment. First part of this thesis describes sentiment analysis theory, definition of sentiment analysis, its problems, methods, reasons for use and use cases of sentiment analysis. Second part gives background research of often used methods and data sources for sentiment analysis in foreign research. Finally third part of this theses describes experiment, its preparation and results. Main benefit of this theses is creating model which can be later use in real word.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.